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Abstract 

We propose a new MDS paradigm called reader- 
aware multi-document summarization (RA-MDS). 
Specifically, a set of reader comments associated 
with the news reports are also collected. The gen¬ 
erated summaries from the reports for the event 
should be salient according to not only the reports 
but also the reader comments. To tackle this RA- 
MDS problem, we propose a sparse-coding-based 
method that is able to calculate the salience of the 
text units by jointly considering news reports and 
reader comments. Another reader-aware charac¬ 
teristic of our framework is to improve linguistic 
quality via entity rewriting. The rewriting consid¬ 
eration is jointly assessed together with other sum¬ 
marization requirements under a unified optimiza¬ 
tion model. To support the generation of compres¬ 
sive summaries via optimization, we explore a finer 
syntactic unit, namely, noun/verb phrase. In this 
work, we also generate a data set for conducting 
RA-MDS. Extensive experiments on this data set 
and some classical data sets demonstrate the effec¬ 
tiveness of our proposed approach. 

1 Introduction 

In the typical multi-document summarization (MDS) set¬ 
ting, the input is a set of documents/reports about the same 
topic/event. The reports on the same event normally cover 
many aspects and the continuous follow-up reports bring 
in more information of it. Therefore, it is very chal¬ 
lenging to generate a short and salient summary for an 
event. MDS has drawn some attention and some method 
have been proposed. For example. Wan et al. 1 120071 pro¬ 
posed an extraction-based approach that employs a mani¬ 
fold ranking method to calculate the salience of each sen¬ 
tence. Filatova and Hatzivassiloglou 1120041 modeled the 
MDS task as an instance of the maximum coverage set prob¬ 
lem. Gillick and Favre [ 2009 1 developed an exact solution 
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for a model similar to [Filatova and Hatzivassiloglou, 20041 
based on the weighted sum of the concepts (approximated by 
bigrams). [Li et_al., 20131 proposed a guided sentence com¬ 
pression framework to generate compressive summaries by 
training a conditional random field (CRF) based on a an¬ 
notated corpus. l |Li et al., 2014| conside red linguistic qual¬ 
ity in their framework. ||Ng et al., 2014) exploited timelines 


to enhance MDS. Moreover, many works [Liu et al., 2012 


Kageback et al., 2014 |Denil et al., 2014[ |Cao et al., 2015 


utilized deep learning techniques to tackle summarization 
tasks. 


As more and more user generated content is available, one 
natural extension of the setting is to incorporate such content 
regarding the event so as to directly or indirectly improve the 
generated summaries with greater user satisfaction. In this 
paper, we investigate a new setting in this direction. Specif¬ 
ically, a set of reader comments associated with the news re¬ 
ports are also collected. The generated summaries from the 
reports for the event should be salient according to not only 
the reports but also the reader comments. We name such a 
paradigm of extension as reader-aware multi-document sum¬ 
marization (RA-MDS). 

We give a real example taken from a data set collected by 
us to illustrate the importance of RA-MDS. One hot event in 
2014 is “Malaysia Airlines jet MH370 disappeared”. After 
the outbreak of this event, lots of reports are posted on dif¬ 
ferent news media. Most existing summarization systems can 
only create summaries with general information, e.g., “Flight 
MH370, carrying 227 passengers and 12 crew members, van¬ 
ished early Saturday after departing Kuala Lumpur for Bei¬ 
jing”, due to the fact that they extract information solely from 
the report content. However, after analyzing the reader com¬ 
ments, we find that many readers are interested in more spe¬ 
cific aspects, such as “Military radar indicated that the plane 
may have turned from its flight route before losing contact” 
and “Two passengers who appear to have used stolen Euro¬ 
pean passports to board”. Under the RA-MDS setting, one 
should jointly consider news and comments when generating 
the summary so that the summary content can cover not only 
important aspects of the event, but also aspects that attract 
reader interests as reflected in the reader comments. 

No previous work has investigated how to incorporate the 
comments in MDS problem. One challenge is how to con¬ 
duct salience calculation by jointly considering the focus of 




























news reports and the reader interests revealed by comments. 
Meanwhile, the model should not be sensitive to the avail¬ 
ability of diverse aspects of reader comments. Another chal¬ 
lenge is that reader comments are very noisy, grammatically 
and informatively. Some previous works explore the effect 
of comments or social contexts in single document sum¬ 
marization (such as blog summarization) et al., 2008] 
Yang et al., 2011) . However, the problem setting of RA- 
MDS is more challenging because the considered comments 
are about an event with multiple reports spanning a time pe¬ 
riod, resulting in diverse and noisy comments. 

To tackle the above challenges, we propose a sparse- 
coding-based method that is able to calculate the salience of 
the text units by jointly considering news reports and reader 
comments. Intuitively, the nature of summarization is to se¬ 
lect a small number of semantic units to reconstruct the orig¬ 
inal semantic space of the whole topic. In our RA-MDS 
setting, the semantic space incorporates both the news and 
reader comments. The selected semantic units are sparse 
and hold the semantic diversity property. Then one issue 
is how to find these sparse and diverse semantic units effi¬ 
ciently without supervised training data. Sparse coding is a 
suitable method for learning sets of over-complete bases to 
represent data efficiently, and it has been demonstrated to be 
very useful in computer vision IMairal et al., 2014) . More¬ 
over, sparse coding can jointly consider news and comments 
to select semantic units in a very simple and elegant way, 
by just adding a comments reconstruction error item into the 
original loss function. Currently, there are only a few works 
employing sparse coding for the summarization task. DSDR 
| |He et al., 2012) represents each sentence as a non-negative 
linear combination of summary sentences. But this method 
does not consider the sparsity. MDS-Sparse [ [Liu et al., 2015) 
proposed a two-level sparse representation model, consider¬ 
ing coverage, sparsity, and diversity. But their results do not 
show a significant improvement. In this paper, we propose a 
more efficient and direct sparse model to tackle these prob¬ 
lems and achieve encouraging results on different data sets. 

Another reader-aware characteristic of our framework is 
to improve linguistic quality via entity rewriting. Sum¬ 
maries may contain phrases that are not understandable out 
of context since the sentences compiled from different doc¬ 
uments might contain too little, too much, or repeated infor¬ 
mation about the referent. A human summary writer only 
uses the full-form mention (e.g. President Barack Obama) 
of an entity one time and uses the short-form mention (e.g. 
Obama) in the other places. Analogously, for a particu¬ 
lar entity, our framework requires that the full-form men¬ 
tion of the entity should only appear one time in the sum¬ 
mary and its other appearances should use the most con¬ 
cise form. Some early works perform rewriting along with 
the greedy selection of individual sentence | |Nenkova, 2008| . 
Some other works perform summary rewriting as a post¬ 
processing step [ jSiddharthan et al., 2011) . In contrast with 
such works, the rewriting consideration in our framework is 
jointly assessed together with other summarization require¬ 
ments under a unified optimization model. This brings in 
two advantages. First, the assessment of rewriting opera¬ 
tion is jointly considered with the generation of the compres- 
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Figure 1: Our RA-MDS framework. 


sive summary so that it has a global view to generate better 
rewriting results. Second, we can make full use of the length 
limit because the effect of rewriting operation on summary 
length is simultaneously considered with other constraints in 
the model. To support the generation of compressive sum¬ 
maries via optimization, we explore a finer syntactic unit, 
namely, noun/verb phrase. Precisely, we first decompose the 
sentences into noun/verb phrases and the salience of each 
phrase is calculated by jointly considering its importance in 
reports and comments. 

In this work, we also generate a data set for conducting 
RA-MDS. Extensive experiments on our data set and some 
benchmark data sets have been conducted to examine the ef¬ 
ficacy of our framework. 

2 Description of the Proposed Framework 
2.1 Overview 

To tackle the RA-MDS problem, we propose an unsuper¬ 
vised compressive summarization framework. The overview 
of our frameworkis depicted in Fig.Q] A sparse-coding-based 
method is proposed to reconstruct the semantic space of a 
topic, revealed by both the news sentences i.e., Xi’s and the 
comment sentences i.e., Zi s, on the news sentences. Thus, an 
expressiveness score a,; is designed for each news sentence. 
The dashed boxes of comment sentences indicate that a spe¬ 
cial treatment is applied on comments to avoid noise in the 
reconstruction. The details will be introduced in Section [27771 
The compression is carried out by deleting the unimportant 
constituents, i.e. phrases, of the input sentence. We first 
decompose each sentence into noun phrases (NPs) and verb 
phrases (VPs). The salience of a phrase depends on two crite¬ 
ria, namely, the expressiveness score inherited from the sen¬ 
tence, and the concept score of the phrase. The extraction 
of phrases and the calculation of phrase salience will be in¬ 
troduced in Section 12731 Our framework carries out mention 
rewriting for entities to improve the linguistic quality of our 
summary. Specifically, we rewrite the mentions of three types 
of named entities, namely, person, location, and organization. 
We will discuss the details of mention detection, mention 
cluster merging, short-form and full-form mention finding in 
Section 12.41 After the above preparation steps, we will in¬ 
troduce our summarization model in Section 12751 Our model 
simultaneously performs sentence compression and mention 
rewriting via a unified optimization method. Meanwhile, a 
































variety of summarization requirements are considered via for¬ 
mulating them as the constraints. 

2.2 Reader-Aware Sentence Expressiveness 

Intuitively, the nature of summarization is to select semantic 
units which can be used to reconstruct the original semantic 
space of the topic. The expressiveness score of a sentence 
in the news is defined as its contribution in constructing the 
semantic space of the topic from both the news content and 
the reader comments. Therefore, the expressiveness conveys 
the attention that a sentence attracts from both the news writ¬ 
ers and the readers. We propose a sparse coding model to 
compute such expressiveness scores. 

In typical sparse coding, the aim is to find a set of basis 
vectors fa which can be used to reconstruct to target/input 
vectors x, : as a linear combination of them so as to minimize 
the following loss function: 

m k k 

min Y li x * - 55 ai ifa II 2 + A 55 (1) 

i=l j= 1 j= 1 

where S(.) is a sparsity cost function which penalizes aj for 
being far from zero. 

In our summarization task, each topic contains a set of 
news reports and a set of reader comments. After stem¬ 
ming and stop-word removal, we build a dictionary for the 
topic by using unigrams and bigrams from the news. Then, 
each sentence of news and comments is represented as a 
weighted term-frequency vector. Let X = {xi, X 2 , ..., x m } 
and Z = {zi,Z 2 ,... ,z„} denote the vectors of sentences 
from news and comments respectively, where x* £ K' / and 
z i £ Mr are term-frequency vectors. There are d terms in 
dictionary, m sentences in news, and n sentences in com¬ 
ments for each topic. We take semantic units as sentences 
here, and assume that for each sentence x,, there is a coef¬ 
ficient variable a,;, named expressiveness score, to represent 
the contribution of this sentence in the reconstruction. 

Based on the spirit of sparse coding, we directly regard 
each news sentence x, as a candidate basis vector, and all 
Xj’s are employed to reconstruct the semantic space of the 
topic, including X and Z. Thus we propose a preliminary 
error formulation as expressed in Eq. [2] for which we aim at 
minimizing: 

m m n m 

2 ^ 51 H x * - 55 a ^\\l + ^ 55 H z * - 55 Hi (2 > 

2=1 j= 1 2=1 j= 1 

where the coefficient a/s are the expressiveness scores and 
all the target vectors share the same coefficient vector A here. 

To harness the characteristics of the summarization prob¬ 
lem setting more effectively, we refine the preliminary er¬ 
ror formulation as given in Eq. [2] along three directions. 
(1) As mentioned before, the original sentence vector space 
can be constructed by a subset of them, i.e., the number 
of summary sentences is sparse, so we put a sparsity con¬ 
straint on the coefficient vector A using Li-norm A||A||i in 
Eq. [2] with the weight A as a scaling constant to determine 
its relative importance. Moreover, we just consider non¬ 
negative linear reconstruction in our framework, so we add 


non-negative constraints on the coefficients. (2) As previous 
work )Ng et ai, 201 1[ mentioned, some prior knowledge can 
benefit the sentence expressiveness detection performance, 
e.g., sentence position. So we add a variable pi to weight 
each news-sentence reconstruction error. Here, we employ 
the position information to generate p : 


f C p _, if p < p 
1 C p , otherwise 


(3) 


where p is the paragraph ID in each document starting from 
0, and C is a positive constant which smaller than 1. (3) Be¬ 
sides those useful information, comments usually introduce 
lots of noise data. To tackle this problem, our first step is to 
eliminate terms only appear in comments; another step is to 
add a parameter r, to control the comment-sentence recon¬ 
struction error. Due to the fact that the semantic units of gen¬ 
erated summaries are all from news, intuitively, a comment- 
sentence will introduce more information if it is more similar 
with news. Therefore, we employ the mean cosine similarity 
between comment-sentence z* with all the news-sentences X 
as the weight variable t* . 

After the above considerations, we have the global loss 
function as follows: 


II2 

'■ill 2 


J=m i n 2m 55 p* n - 55 

i=l j=l 

1 n m 

+ 2^55 ri ll Zi -55ai x illl + A||A|| 1 

i=l i=l 

s.t. a,j > 0 for j £ {1,..., m}, A > 0 


(4) 


For the optimization problem of sparse coding, there are al¬ 
ready many classical algorithms |Mairal et ai , 2014) . In this 
paper, we utilize Coordinate Descent method as shown in Al¬ 
gorithm!]] Under the iterative updating rule as in Eq. [7] the 
objective function J is non-increasing, and the convergence 
of the iteration is guaranteed. 

Our sparse coding model introduces several advantages. 
First, sparse coding is a class of unsupervised methods, so 
no manual annotations for training data are needed. Second, 
the optimization procedure is modular leading to easily plug 
in different loss functions. Third, our model incorporates se¬ 
mantic diversity naturally, as mentioned in | |He et ai, 2012) . 
Last but not the least, it helps the subsequent unified op¬ 
timization component which generates compressive sum¬ 
maries. In particular, it reduces the number of variables be¬ 
cause the sparsity constraint can generate sparse expressive¬ 
ness scores, i.e., most of the sentences get a 0 score. 


2.3 Phrase Extraction and Salience Calculation 

We employ Stanford parser I Klein and Manning, 20031 to ob¬ 
tain a constituency tree for each input sentence. After that, 
we extract NPs and VPs from the tree as follows: (1) The 
NPs and VPs that are the direct children of the S node are ex¬ 
tracted. (2) VPs (NPs) in a path on which all the nodes are all 
VPs (NPs) are also recursively extracted and regarded as hav¬ 
ing the same parent node S. Recursive operation in the second 
step will only be carried out in two levels since the phrases in 
the lower levels may not be able to convey a complete fact. 











Algorithm 1 Coordinate descent algorithm for sentence ex¬ 
pressiveness detection 

Input: News sentences X G R' ixr ", comments sentences 
Z G R dxn , news reconstruction weight pi , comments 
reconstruction weight t, , penalty parameter A, and stop¬ 
ping criterion T and e. 

Output: Salience vector A* G R m . 

1: Initialize A 4 — 0, t <— 0; 

2: while t <T and ./' > £ do 

m 

3: reconstructing: x = ^ a j x j 

j=i 

4: take partial derivatives for reconstruction error items: 


ROOT 

I 
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Figure 2: The constituency tree of a sentence. 



i=i 


5: select the coordinate with maximum partial derivative: 

k = arg max 

k=l...m 

6: update the coordinate by soft-thresholding 

(Donoho and Johnstone, 1994| : 

a T Sx ( a l ~ 11 g , (7) 

where S\ : ai >->• sign(ai)max(\a,i\ — A, 0). 

7: J^ 4 — JAt+\ — JA t , f ^— t A 1 

8 : end while 
9: return A* = A. 


dJ 


da k 


( 6 ) 


Take the tree in Fig.[2]as an example, the corresponding sen¬ 
tence is decomposed into phrases “An armed man”, “walked 
into an Amish school, sent the boys outside and tied up and 
shot the girls, killing three of them”, “walked into an Amish 
school”, “sent the boys outside”, and “tied up and shot the 
girls, killing three of them”. Q 

The salience of a phrase depends on two criteria. The first 
criterion is the expressiveness score which is inherited from 
the corresponding sentence in the output of our sparse cod¬ 
ing model. The second criterion is the concept score that 
conveys the overall importance of the individual concepts in 
the phrase. Let tf(t) be the frequency of the term t (un¬ 
igram/bigram) in the whole topic. The salience Si of the 


1 Because of the recursive operation, the extracted phrases may 
have overlaps. Later, we will show how to avoid such overlapping 
in phrase extraction. We only consider the recursive operation for a 
VP with more than one parallel sub-VPs, such as the highest VP in 
Fig- [2] The sub-VPs following modal, link or auxiliary verbs are not 
extracted as individual VPs. In addition, we also extract the clauses 
functioning as subjects of sentences as NPs, such as “that clause”. 
Note that we also mention such clauses as “noun phrase” although 
their labels in the tree could be “SBAR” or “S”. 


phrase Pi is defined as: 


E */(*) 

q _ *6 P i 

i E */(*) 

t^Topic 


( 8 ) 


where m is the expressiveness of the sentence containing /' . 


2.4 Preparation of Entity Mentions for Rewriting 

We first conduct co-reference resolution for each doc¬ 
ument using Stanford co-reference resolution pack¬ 
age [Lee et ai, 2013) . We adopt those resolution rules 
that are able to achieve high quality and address our need for 
summarization. In particular. Sieve 1, 2, 3, 4, 5, 9, and 10 in 
the package are employed. A set of clusters are obtained and 
each cluster contains the mentions corresponding to the same 
entity in a document. The clusters from different documents 
in the same topic are merged by matching the named entities. 
Three types of entities are considered, namely, person, 
location, and organization. 

Let M denote the mention cluster of an entity. The full- 
form mention nJ is determined as: 

= arg max E' f/'(f) (9) 

meM tern 

where tf'(t) is calculated in M. We do not simply select 
the longest one since it could be too verbose. The short-form 
mention m s is determined as: 

m s = argmax E' tf'(t) (10) 

meM ' tgm 

where M' contains the mentions that are the shortest and 
meanwhile are not pronouns. 


2.5 Unified Optimization Framework 

The objective function of our optimization formulation is de¬ 
fined as: 

max ay + (ID 

i i<j 

where a t is the selection indicator for the phrase /',, Si is the 
salience scores of Pi, ctij and R, 3 is co-occurrence indicator 
and the similarity a pair of phrases (P L , Pj) respectively. The 
similarity is calculated with the Jaccard Index based method. 
Specifically, this objective maximizes the salience score of 


















the selected phrases as indicated by the first term, and pe¬ 
nalizes the selection of similar phrase pairs. The constraints 
that govern the selected phrases are able to form compressive 
sentences and the constraints for entity rewriting are given 
below. Note that the rewriting consideration is conducted 
for different candidates for the purpose of the assessment of 
the effects on summarization in the optimization framework. 
Consequently, no actual permanent rewriting operations are 
conducted during the optimization process. The actual rewrit¬ 
ing operations will be carried out on the selected phrases out¬ 
put from the optimization component in the post-processing 
stage. 

Compressive sentence generation. Let f3k de¬ 
note the selection indicator of sentence xk- If any phrase from 
Xk is selected, /3k = 1. Otherwise, (3k = 0. For generating a 
compressed summary sentence, it is required that if f3k = 1 , 
at least one NP and at lease one VP of the sentence should be 
selected. It is expressed as: 

VP; G x k A Pi is an NP, < f3 k A ^ a* > /3 k , (12) 

i 

VP; G x k A Pi is aVP,ai < /3 k A ^ a* > f3 k ■ (13) 

i 

Entity rewriting. Let Pm be the phrases that contain 
the entity corresponding to the cluster M. For each P, G Pm, 
two indicators 7/ and 7® are defined. 7/ indicates that the 
entity in Pi is rewritten by the full-form, while 7? indicates 
that the entity in p is rewritten by the short-form. To adopt 
our rewriting strategy, we design the following constraints: 

if 3Pi G P M A oh = 1, 7/ = 1, (14) 

if Pi G P M A a,i = 1, 7 / + 7 ■ = 1. (15) 

Note that if a phrase contains several mentions of the same 
entity, we can safely rewrite the latter appearances with the 
short-form mention and we only need to decide the rewriting 
strategy for the first appearance. 

Not i-within-i. Two phrases in the same path of the 
constituency tree cannot be selected at the same time: 

if 3 Pk Pj , then ak + aj < 1, (16) 

For example, “walked into an Amish school, sent the boys 
outside and tied up and shot the girls, killing three of them” 
and “walked into an Amish school” cannot be both selected. 
Phrase co-occurrence. These constraints control the 
co-occurrence relation of two phrases: 

atj — at < 0, otij — aj <0, ai + aj - a tj < 1; (17) 

The first two constraints state that if the summary includes 
both the units Pi and Pj, then we have to include them indi¬ 
vidually. The third constraint is the inverse of the first two. 
Short sentence avoidance. We do not select the 
VPs from the sentences shorter than a threshold because a 
short sentence normally cannot convey a complete key fact 
Pronoun avoidance. As previously observed 
| |Woodsend and Lapata, 2012) , pronouns are normally 
not used by human summary writers. We exclude the NPs 
that are pronouns from being selected. 


Length constraint. The overall length of the selected 
NPs and VPs is no larger than a limit L. Note that the length 
calculation considers the effect of rewriting operations via the 
rewriting indicators. 

The objective function and constraints are linear so 
that the optimization can be solved by existing Integer 
Linear Programming (ILP) solvers such as simplex algo¬ 
rithm [Dantzig and Thapa, 19971. In the implementation, we 
use a package called lp_solv£J 


2.6 Postprocessing 

The timestamp of a summary sentence is defined as the times¬ 
tamp of the corresponding document. The sentences are or¬ 
dered based on their pseudo-timestamps. The sentences from 
the same document are ordered according to their original or¬ 
der. Finally, we conduct the appropriate entity rewriting as 
indicated from the optimization output. 


3 Experiments 

3.1 Experimental Setting 

Our data set. Our data set contains 37 topics. Each topic 
contains 10 related news reports and at least 200 reader com¬ 
ments. For each topic, we employ summary writers with jour¬ 
nalist background to write model summaries. When writing 
summaries, they take into account the interest of readers by 
digesting the reader comments of the event. 3 model sum¬ 
maries are written for each topic. We also have a separate 
development (tuning) set containing 24 topics and each topic 
has one model summary. 

DUC. In order to show that our sparse coding based frame¬ 
work can also work well on traditional MDS task, we employ 
the benchmark data sets DUC 2006 and DUC 2007 for eval¬ 
uation. DUC 2006 and DUC 2007 contain 50 and 45 topics 
respectively. Each topic has 25 news documents and 4 model 
summaries. The length of the model summary is limited by 
250 words. 

Evaluation metric. We use ROUGE score as our evalu¬ 
ation metric |Lin, 20040 and the F-measures of ROUGE-1, 
ROUGE-2 and ROUGE-SU4 are reported. 

Parameter settings. We set C = 0.8 and p = 4 in the 
position weight function. For the sparse coding model, we 
set the stopping criteria T = 300, e = 10~ 2 3 4 , and the learning 
rate r) = 1. For the sparsity item penalty, we set A = 0.005. 

3.2 Results on Our Data Set 

We compare our system with three summarization baselines. 
Random baseline selects sentences randomly for each topic. 
Lead baseline [Wasson, 1998) ranks the news chronologi¬ 
cally and extracts the leading sentences one by one. MEAD 
|Radev et al., 20040 generates summaries using cluster cen¬ 
troids produced by a topic detection and tracking system. 

As shown in Table Q] our system reports the best results 
on all of ROUGE metrics. The reasons are as follows: (1) 
Our sparse coding model directly assigns coefficient values 

2 http://lpsolve.sourceforge.net/5.5/ 

3 http://www.berouge.com 

4 http://www. summarization.com/mead/ 














System 

Rouge-1 

Rouge-2 

Rouge-SU4 

Random 

0.334 

0.069 

0.109 

Lead 

0.355 

0.098 

0.133 

MEAD 

0.406 

0.127 

0.161 

Ours 

0.438 

0.155 

0.186 


Table 1: Results on our data set. 


as expressiveness scores to the news sentences, which are ob¬ 
tained by minimizing the global semantic space reconstruc¬ 
tion error and are able to precisely represent the importance 
of sentences. (2) The model can jointly consider news con¬ 
tent and reader comments taking into account of more reader- 
aware information. (3) In our sparse coding model, we weight 
the reconstruction error by a prior knowledge, i.e., paragraph 
position, which can improve the summarization performance 
significantly. (4) Our unified optimization framework can fur¬ 
ther filter the unimportant NPs and VPs and generate the com¬ 
pressed summaries. (5) We conduct entity rewriting in the 
unified optimization framework in order to improve the lin¬ 
guistic quality. 

3.3 Results on DUC 

In order to illustrate the performance of our framework on 
traditional MDS task, we compare it with several state-of- 
the-art systems on standard data set DUC. Our framework 
can still be used for MDS task without reader comments by 
ignoring those components for comments. 

Besides Random and Lead methods, we compare our 
system with two other unsupervised sparse coding based 
methods, namely DSDR | |He et al., 2012| and MDS-Sparse 
| |Liu et al., 2015| (MDS-Sparse+div and MDS-Sparse-div). 
Because both data set and evaluation metrics are standard, 
we directly retrieve the results in their papers. The results are 
given in Tables [2] and [3] Our system can significantly outper¬ 
form the comparison methods for the reasons mentioned in 
Section lT2l 


3.4 Case Study 

Based on the news and comments of the topic “Bitcoin ex¬ 
change Mt. Gox goes offline”, we generate two summaries 
with our model considering comments (Ours) and ignor¬ 
ing comments (Ours-noC) respectively. The summaries and 
ROUGE evaluation are given in Table [4J All the ROUGE 
values of our model considering comments are better than 
those ignoring comments with large gaps. The sentences in 
italic bold of the two summaries are different. By review¬ 
ing the comments of this topic, we find that many comments 
are talking about “The company had lost 744,000 Bitcoins ...” 
and “Anonymity prevents reversal of transactions.”, which are 
well identified by our model. 


System 

Rouge-1 

Rouge-2 

Rouge-SU4 

Ours-noC 

0.365 

0.097 

0.126 


Mt. Gox went offline today , as trading on the Tokyo-based site 
came to a screeching halt. A withdrawal ban imposed at the 
exchange earlier this month. Deposits are insured by the gov¬ 
ernment. The sudden closure of the Mt. Gox Bitcoin exchange 
sent the virtual currency to a three-month low on Monday the 
currency’s value has fallen to about $470 from $550 in the past 
few hours. The statement from the Bitcoin companies on Mon¬ 
day night, which was not signed by Mr. Silbert, are committed 
to the future of Bitcoin and the security of all customer funds. 


Ours 


0.414 


0.124 


0.164 


Mt. Gox went offline today , as trading on the Tokyo-based site 
came to a screeching halt. The company had lost 744,000 Bit- 
coins in a theft that had gone unnoticed for years. The sudden 
closure of the Mt. Gox Bitcoin exchange sent the virtual cur¬ 
rency to a three-month low on Monday. The currency's value has 
fallen to about $470 from $550 in the past few hours. Anonymity 
prevents reversal of transactions. The statement from the Bit- 
coin companies on Monday night, which was not signed by Mr. 
Silbert, are committed to the future of Bitcoin and the security 
of all customer funds. 


Table 4: Generated summaries for the topic “Bitcoin ex¬ 
change Mt. Gox goes offline”. 


System 

Rouge-1 

Rouge-2 

Rouge-SU4 

Random 

0.280 

0.046 

0.088 

Lead 

0.308 

0.048 

0.087 

DSDR-non 

0.332 

0.060 

- 

MDS-Sparse+div 

0.340 

0.052 

0.107 

MDS-Sparse-div 

0.344 

0.051 

0.107 

Ours 

0.391 

0.081 

0.136 

Table 2: 

: Results on DUC 2006. 



System 

Rouge-1 

Rouge-2 

Rouge-SU4 

Random 

0.302 

0.046 

0.088 

Lead 

0.312 

0.058 

0.102 

DSDR-non 

0.396 

0.074 

- 

MDS-Sparse+div 

0.353 

0.055 

0.112 

MDS-Sparse-div 

0.354 

0.064 

0.117 

Ours 

0.403 

0.092 

0.146 


Table 3: Results on DUC 2007. 


We also present an entity rewriting case study. For per¬ 
son name “Dong Nguyen” in the topic “Flappy Bird”, the 
summary without entity rewriting contains different men¬ 
tion forms such as “Dong Nguyen”, “Dong” and “Nguyen”. 
After rewriting, “Dong” is replaced by “Nguyen”, which 
makes the co-reference mentions clearer. As expected, there 
is only one full-form mention, such as “Nguyen Ha Dong, 
a Hanoi-based game developer” “Shuhei Yoshida, president 
of Sony Computer Entertainment Worldwide Studios”, and 
“The Australian Maritime Safety Authority’s Rescue Coordi¬ 
nation Centre, which is overseeing the rescue ”, in each sum¬ 
mary. 

4 Conclusion 

We propose a new MDS paradigm called reader-aware multi¬ 
document summarization (RA-MDS). To tackle this RA- 
MDS problem, we propose a sparse-coding-based method 
jointly considering news reports and reader comments. We 
propose a compression-based unified optimization frame¬ 
work which explores a finer syntactic unit, namely, noun/verb 


























phrase, to generate compressive summaries, and meanwhile it 
conducts entity rewriting aiming at better linguistic quality. In 
this work, we also generate a data set for RA-MDS task. The 
experimental results show that our framework can achieve 
good performance and outperform state-of-the-art unsuper¬ 
vised systems. 
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